Forgetting in Reinforcement Learning Links Sustained Dopamine Signals to Motivation
نویسندگان
چکیده
It has been suggested that dopamine (DA) represents reward-prediction-error (RPE) defined in reinforcement learning and therefore DA responds to unpredicted but not predicted reward. However, recent studies have found DA response sustained towards predictable reward in tasks involving self-paced behavior, and suggested that this response represents a motivational signal. We have previously shown that RPE can sustain if there is decay/forgetting of learned-values, which can be implemented as decay of synaptic strengths storing learned-values. This account, however, did not explain the suggested link between tonic/sustained DA and motivation. In the present work, we explored the motivational effects of the value-decay in self-paced approach behavior, modeled as a series of 'Go' or 'No-Go' selections towards a goal. Through simulations, we found that the value-decay can enhance motivation, specifically, facilitate fast goal-reaching, albeit counterintuitively. Mathematical analyses revealed that underlying potential mechanisms are twofold: (1) decay-induced sustained RPE creates a gradient of 'Go' values towards a goal, and (2) value-contrasts between 'Go' and 'No-Go' are generated because while chosen values are continually updated, unchosen values simply decay. Our model provides potential explanations for the key experimental findings that suggest DA's roles in motivation: (i) slowdown of behavior by post-training blockade of DA signaling, (ii) observations that DA blockade severely impairs effortful actions to obtain rewards while largely sparing seeking of easily obtainable rewards, and (iii) relationships between the reward amount, the level of motivation reflected in the speed of behavior, and the average level of DA. These results indicate that reinforcement learning with value-decay, or forgetting, provides a parsimonious mechanistic account for the DA's roles in value-learning and motivation. Our results also suggest that when biological systems for value-learning are active even though learning has apparently converged, the systems might be in a state of dynamic equilibrium, where learning and forgetting are balanced.
منابع مشابه
tHe Cognitive neuroSCienCe of Motivation and learning
Recent advances in the cognitive neuroscience of motivation and learning have demonstrated a critical role for midbrain dopamine and its targets in reward prediction. Converging evidence suggests that midbrain dopamine neurons signal a reward prediction error, allowing an organism to predict, and to act to increase, the probability of reward in the future. This view has been highly successful i...
متن کاملSurvey of effective factors on learning motivation of clinical students and suggesting the appropriate methods for reinforcement the learning motivation from the viewpoints of nursing and midwifery faculty, Tabriz University of Medical Sciences 2002.
Introduction. Motives are the powerful force in process of education– learning, so that the richest and best training plans and structured education are not effective if the lack of motivation existed. In spite of the fact that the success of teacher depends on the learning motivation of students, then it is necessary for teachers to know the effective methods for motivating the students and t...
متن کاملPii: S0306-4522(98)00697-6
This study investigated how the simulated response of dopamine neurons to reward-related stimuli could be used as reinforcement signal for learning a spatial delayed response task. Spatial delayed response tasks assess the functions of frontal cortex and basal ganglia in short-term memory, movement preparation and expectation of environmental events. In these tasks, a stimulus appears for a sho...
متن کاملDynamic shaping of dopamine signals during probabilistic Pavlovian conditioning.
Cue- and reward-evoked phasic dopamine activity during Pavlovian and operant conditioning paradigms is well correlated with reward-prediction errors from formal reinforcement learning models, which feature teaching signals in the form of discrepancies between actual and expected reward outcomes. Additionally, in learning tasks where conditioned cues probabilistically predict rewards, dopamine n...
متن کاملA Model of Dopamine and Uncertainty Using Temporal Difference
Does dopamine code for uncertainty (Fiorillo, Tobler & Schultz, 2003; 2005) or is the sustained activation recorded from dopamine neurons a result of Temporal Difference (TD) backpropagating errors (Niv, Duff & Dayan, 2005)? An answer to this question could result in a better understanding of the nature of dopamine signaling, with implications for cognitive disorders, like Schizophrenia. A comp...
متن کامل